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Abstract 

A statistical test of independence may be constructed using the Hilbert- 
Schmidt Independence Criterion (HSIC) as a test statistic. The HSIC is 
defined as the distance between the embedding of the joint distribution, 
and the embedding of the product of the marginals, in a Reproducing 
Kernel Hilbert Space (RKHS). It has previously been shown that when 
the kernel used In defining the joint embedding is characteristic (that is, 
the embedding of the joint distribution to the feature space is injective), 
then the HSIC-based test is consistent. In particular, it is sufficient for the 
product of kernels on the individual domains to be characteristic on the 
joint domain. In this note, it is established via a result of Lyons (2013) 
that HSIC-based independence tests are consistent when kernels on the 
marginals are characteristic on their respective domains, even when the 
product of kernels is not characteristic on the joint domain. 

1 Introduction 

The Hilbert-Schmidt Independence Criterion [5] provides a measure of depen¬ 
dence between random variables X on domain X , and Y on domain y, with 
joint probability measure Pxy on X x y. This dependence measure may be 
used in statistical tests of dependence [SlIS]. The simplest way to understand 
HSIC is as the distance between an embedding of the joint distribution and the 
product of the marginals, to an appropriate feature space 013], which is in our 
case a reproducing kernel Hilbert space. The distance covariance of m is a 
special case, for a particular choice of kernel [H]. We say the feature space is 
characteristic when the embedding is injective, and uniquely identifies proba¬ 
bility measures test based on HSIC is consistent when product of 

kernels on the domains being compared is characteristic to the joint domain (TJ 
Theorem 3]. This is shown to be the case e.g. when Gaussian kernels are used 
on each of the domains. 

We propose a simpler condition: namely, that the kernels on each of the 
individual domains X and y should be characteristic to those domains. The 
result is a direct consequence of [71 Lemma 3.8]. The result is of particular inter¬ 
est since it may be easier to define characteristic kernels on individual domains 
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than on the joint domain. For example, characteristic kernels may be defined on 
the group of orthogonal matrices Section 4], and on the semigroup of vectors 
of non-negative reals Section 5], however a kernel jointly characteristic to 
both domains (i.e., to orthogonal matrix/non-negative vector pairs) is harder 
to define. 


2 Results 

We begin with a result from m that characteristic, translation invariant kernels 
provide injective embeddings of finite signed measures. 

Proposition 1 (Injective embeddings of finite signed measures). Let X be 
a Polish, locally compact Hausdorff space. Let k{x,y) be a CQ-kernel, i.e. a 
bounded kernel for which k(x, ■) G Co{X) \/x^here Co(X) is the class of con¬ 
tinuous functions on X that vanish at infinity]}] Assume k{x,y) = k{x — y), i.e. 
the kernel is translation invariant. Define as T the RKHS induced by k. The 
following statements are equivalent: 

1. k is characteristic 

2. The embedding of a finite signed Borel measure fi G A4b(X), defined as 

/i I—/ k{-,x)dfj.{x), (1) 

Jx 

is injective. 

This result may be obtained by combining [101 Proposition 2], which states 
that an RKHS is co-universal iff the embedding in o is injective, with the result 
in uni Section 3.2] that translation invariant kernels are co-universal iff they are 
characteristic. 

This being the case, a minor adaptation of the proof of |3 Lemma 3.8] leads 
to the following result. 

Theorem 2 (Characteristic kernels and independence measures). Let k and I 
be kernels for the respective RKHSs JF on X and Q on y, with respective feature 
maps (j) and if. Assume both k and I are characteristic, translation invariant 
CQ-kernels, satisfying the conditions of Proposition Define the finite signed 
measure 

0 ■= PxY - PxPy- 

Define the covariance operator as the embedding of this signed measure into the 
tensor spac^ fi’iy) ® 

Cyx = [ if{y)®(j){x)de{x,y). 

Jxxy 

Then Cyx =0 ijf 0 = 0. 

^Continuous functions vanishing at infinity are members of / G C{X) such that for all 
£ > 0 the set {x : \f{x)\ > e:} is compact. 

^The tensor product is defined such that (a (S) 6) c = {b, c)g a, Va E b.c E G- 
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Proof. The result 9 = 0 Cyx = 0 is straightforward. We now prove the 
other direction. For every f € J' and B G crfy), we define the finite signed 
Borel measure 

Jxxy 

where I_b(-) is the indicator of the set B. The embedding of this measure to Q 
is injective, and is written 

yuf = J ip{y) {(j){x),f)jrde{x,y) 

= J i'iPiy)®(l}ix))fd9(x,y) 

= J {i){y)<^(i){x))de{x,y) f 

= Cyxf = 0, 

where we have used the linearity of the tensor product 
(a (8> b)c = Tcia ®b) = {b, c) a. 

Since the embedding yuf(B) is injective, we have that Vf = 0. Since this is true 
for all f G we have that 


(p{x)lBiy)d6{x,y) = 0 . 


Define the finite signed measure on A, vb{A) = 6{A x B). The above equation 
can be interpreted as the embedding of this measure to T, 


yi'B = / Hx)^Biy)d6{x,y) = 0 , 

Jxxy 

hence vb = 0, given that the embedding is injective. We conclude that 
9(A X B) = 0 for all Borel sets A, B, and hence 0 = 0. □ 

An important point to note is that the embedding of 6 need not be character¬ 
istic to all probability measures: only the embeddings of each of the individual 
dimensions X and y need be characteristic. A second point is that a consistent 
test still requires characteristic kernels on both domains; it is not sufficient for 
one domain alone to have a characteristic kernel. A simple example can be used 
to illustrate the resulting failure mode: A := K. with a characteristic kernel, 
A* := R with the linear kernel l{yi,y 2 ) = 2 /i 2 / 2 j and points are distributed uni¬ 
formly on a circular ring centered at the origin. The data are dependent, but 
HSIC with these kernels will not detect this dependence. 

Acknowledgements: Thanks to Joris Mooij, Jonas Peters, Dino Sejdi- 
novic, and Bharath Sriperumbudur for helpful discussions. 
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